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Abstract —  VLSI  technologists  are  fast  develop! ng  wafer- scale  integration.  Rather  than  par¬ 
titioning  a  silicon  wafer  into  chips  as  is  usually  dot  ;,  the  idea  behind  wafer-scale  integration  is 
to  assemble  an  entire  system  (or  network  of  chips)  on  a  single  wafer,  thus  avoiding  the  costs  and 
performance  loss  associated  with  individual  packaging  of  chips.  A  major  problem  with  assem¬ 
bling  a  large  system  of  microprocessors  on  a  single  wafer,  however,  is  that  some  of  the  processors, 
or  cells,  on  the  wafer  are  likely  to  be  defective.  This  paper  surveys  practical  procedures  for 
integrating  “around”  such  faults.  The  procedures  are  designed  to  minimize  the  length  of  the 
longest  wire  in  the  system,  thus  minimizing  the  communication  time  between  cells.  Although 
the  underlying  network  problems  arc  NP-completc,  all  the  procedures  can  be  proved  reliable  by 
assuming  a  probabilistic  model  of  cell  failure. 

Key  Words:  channel  width,  fault-tolerant  .ys'ems,  matching,  probabilistic  analysis,  spanning 
tree,  systolic  arrays,  travelling  salesman  problem,  I  roe  of  meshes,  VLSI,  wafer-scale  integration, 
wire  length. 
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1.  Introduction 


VLSI  technologists  are  fast  developing  wafer-scale  integration  [25].  Rather  than  partitioning 
a  silicon  wafer  into  chips  as  is  usually  done,  the  idea  behind  wafer-scale  integration  is  to  assemble 
an  entire  system  (or  network  of  chips)  on  a  single  wafer,  thus  avoiding  the  costs  and  performance 
loss  associated  with  individual  packaging  of  chips.  A  major  problem  with  assembling  a  large 
system  of  microprocessors  on  a  single  wafer,  however,  is  that  some  of  the  processors,  or  cells,  on 
the  wafer  are  likely  to  be  defective,  or  dead.  In  this  paper,  we  survey  algorithms  for  constructing 
systolic  arrays  from  the  live  cells  of  a  silicon  wafer. 

Laser-programming  the  interconnect  of  a  wafer  is  one  promising  means  of  achieving  wafer- 
scale  integration.  This  technology  was  pioneered  at  IBM  [21]  and  pursued  in  the  direction  of 
wafer-scale  integration  at  MIT  Lincoln  Laboratory  [25].  Figure  1  shows  a  scanning  electron 


Figaro  1.  A  close-up  of  laser- programmable  interconnect. 


microscope  photograph  of  a  portion  of  a  wafer  with  programmabfe  interconnect.  Laser  welds  can 
be  made  between  two  layers  of  metal,  and  by  using  the  beam  at  somewhat  higher  power,  wires 
can  be  cut.  Defective  components  can  thus  be  avoided  by  programming  connections  between  only 
the  good  components.' 

Figure  2  shows  a  typical  organization  of  a  wafer-scale  system  with  programmable  intercon¬ 
nections.  The  components  are  organised  as  a  matrix  of  cells,  and  between  the  cells  are  channels 
through  which  the  interconnect  runs.  Figure  3  is  a  close-up  of  the  channel  structure.  At  the 
intersection  of  a  horizontal  and  vertical  channel,  laser-programmable  connections  can  make  a 
horizontal  and  a  vertical  wire  electrically  equivalent.  Between  two  cells,  connections  can  be  made 
from  the  wires  in  the  channel  to  the  inputs  and  outputs  of  the  two  cells.  Given  that  the  inter¬ 
connect  is  programmable,  we  shall  adopt  a  usage  of  the  term  “wire”  to  mean  an  electrically 
equivalent  portion  of  the  programmable  interconnect. 

Systolic  arrays  [12,  13,  20]  are  a  desirable  architecture  for  VLSI  because  all  communication 
is  between  nearest  neighbors.  A  realization  of  a  systolic  array  as  a  wafer-scale  system  may  lose 
this  advantage  if  all  nearest  neighbors  of  a  processor  are  dead,  however,  because  a  long  wire  may 
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Figure  2.  A  wafer-scale  system  of  cells  and  programmable  interconnect. 
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Figure  3.  The  channel  structure  of  a  wafer-scale  ayaiem. 


bo  needed  to  connect  clcctrically-adjncent.  processors.  In  general,  the  longest  interconnection 
between  processors  is  the  communication  bottleneck  of  the  system.  Of  the  many  possible  ways 
in  which  the  live  cells  on  a  wafer  can  be  connected  to  form  a  systolic  array,  therefore,  the  one 
Hint  minimizes  the  length  of  the  longest  wire  is  most  desirable. 

To  illustrate  the  subtleties  inherent  in  configuring  systolic  arrays,  consider  the  problem  of 
constructing  a  linear  (i.e.,  one-dimensional)  array  using  all  of  the  live  colls  in  an  N-ccll  wafer. 
I  'nl’ort unately,  if  wo  wish  to  minimize  the  length  of  the  longest  wire,  the  problem  is  NP-compIete 
[1(1].  liven  more  discouraging  is  that  there  are  some  arrangements  of  live  and  dead  cells  for  which 
ov  <  n  the  optimal  linear  array  lias  unacceptably  iong  wires.  Thus  optimal  solutions  -  even  if  they 
could  be  found  quickly  are  not  always  practical. 
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Hy  assuming  a  probabilistic  model  of  cell  failure,  however,  many  positive  results  can  be  proved. 
I'or  example,  l''igure  \  illustrates  a  possible  solution  to  the  problem  of  connecting  the  live  cells 
of  a  wafer  into  a  linear  systolic  array.  The  live  cells,  which  are  denoted  by  small  squares,  are 
connected  together,  one  after  another,  in  a  snake-like  pattern.  Dead  cells,  denoted  by  X’s,  are 
skipped  over.  With  probability  at  least  1  —  0(1/ A),  the  length  of  the  longest  wire  is  0(lg  A), 
where  A  is  the  number  of  colls  in  the  wafer  and  where  each  cell  independently  has  a  f>0  percent 


Figure  4.  A  simple  means  of  constructing  a  linear  systolic  array  from  the  live  cells  on  a 
wafer. 

chance  of  failure.* 

This  bound  comes  from  the  observation  that  the  length  of  the  longest  wire  that  connects  two 
cells  in  the  array  is  just  the  length  of  the  longest  sequence  of  dead  cells  in  the  snakc-likc  string. 
For  a  given  set  of  k  cells,  the  probability  that  all  are  dead  is  l/2k,  and  thus  the  probability  that 
any  set  of  2  IgA  cells  are  dead  is  1/A 2.  Since  there  are  less  than  A  sets  of  2 IgA  consecutive 
cells,  the  chances  are  thus  less  than  one  in  A  of  having  to  skip  more  than  2 IgA  cells  in  the 
entire  snakc-like  path  of  length  A.  Hence  the  maximum  wire  length  is  0(lg  A)  with  probability 
at  least  1  —  0(1/A). 

To  say  that  “with  probability  1  —  0(1 /A)  the  maximum  wire  length  is  0(lg  A),”  is  a  substan¬ 
tially  stronger  statement  than  saying  that  the  expected  maximum  wire  length  is  0(lg  A).  This 
is  because  no  wire  can  ever  have  length  greater  than  O(v^A),  even  in  the  worst  case.  Hence  the 
expected  maximum  wire  length  is  at  most 

(1  -  0(1/ A))  ■  0(lg  A)  +  0(1/ A)  •  0(VN)  =  0(lg  A)  . 

Moreover,  the  chances  that  the  maximum  wire  length  is  much  greater  than  0(lg  A)  arc  miniscule. 
In  particular,  the  probability  of  having  to  skip  more  than  tig  A  dead  cells  at  a  fixed  point  in 
the  snake- like  path  is  less  than  one  in  Ak.  Hence,  every  wire  has  length  at  most  tig  A  with 
probability  at  least  1  —  1/Ak_l. 

Micro  ;»id  throughout  the  paper,  we  use  0(f(N) j  to  denote  a  function  that  is  bounded  above  by  cf(N)  for  a  fixed 
constant  c  and  all  sufficiently  large  N.  We  also  use  l)(/(N))  to  denote  a  function  that  is  bounded  below  by  cf{N), 
and  W(/(/V))  to  denote  a  function  that  is  bounded  above  by  ri/(7V)  and  below  by  Ce/(/V)  for  some  lixed  constants 
c,  C|  and  c-,  and  all  sullicienlly  large  N.  We  also  use  Ig  N  to  denote  loga  N,  lg:  N  to  denote  (Ig  N)~,  and  lglgJ  N 
to  denote  (IglgTV)-.  I.aslly,  [zj  denotes  the  largest  integer  less  than  or  equal  to  z,  and  fz]  denotes  the  smallest 
integer  greater  than  or  equal  to  x. 


This  p;ij  or  presents  a  survey  of  algorithms  for  realizing  one-  ami  two-dimensional  systolic 
arrays  as  wafer-scale  systems.  Unlike  many  of  the  heuristics  in  the  literature,  the  algorithms 
here  have  ai!  been  theoretically  analyzed,  and  bounds  on  their  quality  have  been  mathematically 
proved.  'Phi'  analyses  make  the  assumption  that  each  cell  fails  independently  with  probability 
p.  and  for  simplicity,  we  assume  here  that  p  =  We  also  assume  for  ease  of  explication  and 
analysis  that  the  width  of  a  cell  and  the  width  of  a  wire  are  each  unity.  A  more  complete 
discussion  of  the  assumptions  and  their  generalizations  can  be  found  in  [17]. 

Phc  algorithms  are  organized  to  aid  an  engineer  in  picking  an  algorithm  for  implementation. 
Wo  try  to  p-esont  enough  mathematics  to  aid  his  intuition,  but  we  do  not,  for  the  most  part, 
include  the  lotailed  combinatorial  arguments  appearing  in  the  literature  that  substantiate  the 
effectiveness  of  the  algorithms.  Since  programming  involves  many  more  "real-world”  constraints 
Ilian  can  he  considered  in  an  algorithmic  analysis,  we  expect  that  the  engineer  might  choose  a 
loss  rih-rti o  nlgoril  lint,  for  example,  if  it  is  easier  to  code.  The  algorithms  here  constitute  a  menu 
of  po;  -i  hi  lilies  to  stimulate  an  intelligent  design  decision. 

The  remainder  of  the  paper  is  divided  into  four  sections.  Section  2  contains  basic  com¬ 
binatorial  facts  underlying  the  probabilistic  analyses  used  in  the  literature.  Section  3  gives  two 
algorithms  f >r  integrating  linear  arrays.  The  lirst  algorithm  connects  all  the  live  cells  on  a  wafer, 
and  the  second  achieve  somewhat  shorter  maximum  wire  length  by  connecting  only  a  large 
constant  fraction  of  the  live  cells.  Section  A  gives  live  algorithms  for  integrating  two-dimensional 
arrays,  and  includes  both  worst-case  and  probabilistic  bounds.  Section  5  discusses  provides  a 
summary  or  the  material  covered  in  the  paper  and  mentions  some  related  work. 

2.  Combinatorial  facts 

In  the  introduction,  wc  showed  tha,  with  probability  at  least  1  —  0(l/N),  a  sequence  of  N 
cells  ”ii  a  wafe  r  contains  no  more  that  0(lg  A')  dead  cells  in  a  row.  This  kind  of  high  probability 
analysis  underlies  most,  of  the  algorithms  in  this  paper.  We  shall  use  the  term  “high  probability” 
m  mean  “uith  probability  at  least  1  —  0(1/Ar)/’  where  N  is  the  number  of  cells  on  a  wafer.  We 
now  presen'  some  basic  facts  used  in  high  probability  analyses. 

The  ft rs  fact,  is  the  standard  definition  of  independence. 

Fact  l.  Let  A  and  II  be  independent  rand' m  variables.  Then 

Pr  {A  0  If'  -  i'r  {/!}  Pr  {  B}  .  | 


l’le  st '-oiid  fact  bounds  the  probability  of  the  -mion  of  two  random  events,  even  if  the  events  are 
tin'  i  ridepor  dent. 

'•Vet  2.  Let  ,1  and  B  be  random  variable.*.  'Th.rn 


Pr  {A  U  B}  <  Pr  {/!}  +  Pr  { B}  . 
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Proof.  This  fact  follows  from  the  principle  of  inclusion  and  exclusion.  We  always  have 

Pr  {A  U  B)  =  Pr  {A}  +  Pr  {5}  -Pr{AnS}, 

and  since  Pr  {AflB}  >  0,  the  result  follows.  | 

Fact  2  provides  a  weak  bound  if  the  probabilities  involved  are  large.  For  example,  if  the 
probability  of  the  individual  events  are  each  greater  than  1/2,  the  bound  on  their  union  is  trivial. 
When  the  probabilities  are  small,  however,  the  bound  can  be  useful. 

The  next  fact  bounds  a  linear  function  with  an  exponential.  It  is  most  useful  when  x  i8  near 
zero. 

Fact  3.  For  all  x  in  the  range  —  oo  <  x  <  oo,  we  have 

1  +  x  <  ex  .  | 


We  now  turn  to  combinatorial  theorems  that  deal  more  directly  with  the  statistics  of  faults 
on  wafers.  As  was  mentioned  in  the  introduction,  we  shall  typically  assume  that  each  cell  on  the 
wafer  fails  independently  with  probability  1/2. 


Fact  4.  With  high  probability,  a  given  rectangular  pattern  of  live  and  dead  cells  of  size  2  Ig  N 
never  appears  on  an  N -cell  wafer. 

Proof.  The  proof  follows  the  analysis  for  the  snake-like  scheme  in  the  introduction,  which  relics 
on  Fact  2.  The  generalization  from  one-  to  two-dimensional  regions  is  straightforward,  as  is  the 
generalization  from  a  pattern  consisting  solely  of  dead  cells  to  an  arbitrary  pattern.  | 

Of  course,  Fact  4  does  not  imply  that  no  pattern  will  occur,  only  that  the  probability  that  a 
given  pattern  occurs  is  low.  It’s  like  the  lottery:  somebody  will  win,  but  probably  not  you. 
Remarkably,  patterns  of  slightly  less  than  half  the  size  almost  always  appear  on  a  wafer. 


Fact  5.  With  high  probability,  a  given  rectangular  pattern  of  live  and  dead  cells  of  size 
lg  N  —  2  Ig  Ig  N  appecrj  somewhere  on  an  N-cell  wafer . 

Proof.  Partition  the  wafer  into  N/(\gN  —  2  lg  lg  7V)  rectangular  regions  of  size  lgiV  —  2lglgN. 
The  probability  that  a  given  one  of  the  regions  realizes  the  pattern  is 

o  — l£  JV+2Ik!*JV  _  , 

2  ~~ 1  ivT ' 

The  probability  that  every  region  avoids  the  pattern  is  therefore 


<  e-'«N 


using  Facts  1  and  3.  | 


In  a  region  of  rn  cells  on  a  wafer,  the  expected  number  of  live  cells  is  The  actual  number 
will  vary,  however.  The  next  fact  gives  tight  bounds  on  the  expected  deviation. 

Fact  6.  Let  ,Y  be  the  random  variable  indicating  the  number  of  live  cells  in  a  region  with  m 
cells.  Then  the  expectation  of  the  deviation  is 

X  -  \m  )  =  0(\/m)  .  | 

l  e  i  n  t<  IN  us  that  the  expected  deviation  from  the  mean  is  (~)(>/m).  We  shall  occasionally 
i, erd  hi  hound  llie  actual  probability  of  some  given  deviation.  The  next  fact  provides  such  a 

b, 

h’nct  7.  Let  A  be  the  random  variable  indicating  the  number  of  live  cells  in  a  region  with  rn 
.  7  •  •  ’  r  >  ().  the  probability  that  the  deviation  exceeds  ry/m  is 


We  can  use  Fact  7  to  prove  a  lower  bound  on  the  number  of  live  cells  in  each  of  a  collection  of 
sufficiently  large  regions.  The  next  fact  shows  that  if  each  region  contains  clg  N  cells,  for  some 
-ail'icii  I'tly  large  constant  c,  then  with  high  probability,  there  are  a  substantial  number  of  live 
cel  1m  in  the  each  of  the  regions. 

Fact  8.  For  any  c  >  4,  and  for  any  particular  collection  of  /V  regions  on  an  N-cell  wafer, 
earn  with  at  least  r\gN  cells,  the  probability  i3  at  least  1  -  C>(l//V)  that  every  region  contains 
c  ig  .V  -  y/c  Ig  iV  live  cells. 

proof.  The  probability  that  a  given  region  does  not  contain  at  least  |c  Ig  N  —  yjc.  Ig  .V  live  cells  is 
Ole  2  ,v)  =  0(1  //V2)  by  Fact  7.  By  Fact  2,  the  probability  that,  all  the  /V  regions  on  the  wafer, 

overlapping  or  not,  fail  to  contain  at  least  h  c  I  g  N  --  y/c  IgA  cells  is  at  most  Ar  •  0{l/N 2)  = 

f-’O/'V).  1 

3.  Integrating  one-dimensional  arraj^s 

With  hii’.h  probability,  the  snake-like  seheu.e  described  in  the  introduction  connects  all  the 
live  :•<■!);  on  an  A'-cell  wafer  into  a  linear  an  ay  with  wires  of  length  at  most  0(lg  N).  This  section 
r  iv  .  t\vi  procedures  that  substantially  improve  and  generalize  this  bound.  The  first  connects  all 
I’,.  !  i  •.  l  cells  on  a  wafer  with  wires  of  length  (){■/ igTV  ),  and  the  second  connects  most  of  the  live 
i  <  I!-.  \vi:  h  wires  of  constant  length. 

Before  presenting  the  algorithms,  we  first  observe  that  with  high  probability,  wires  of  length 
,’i  ./|g  \  i  are  required  to  connect  all  the  live  cells  on  a  wafer.  The  idea  is  that  somewhere  on 
i  he  A  .!.  t  here  is  a  live  cell  in  the  center  of  a  square  region  of  fi(lg  AT)  dead  cells,  an  observation 
Ih  follows  directly  from  Fact  5.  (An  example  of  such  a  region  is  shown  in  Figure  5.)  Therefore, 
a  wire  of  length  U(v/lg  AH  is  required  to  link  the  isolated  live  cell  to  any  other  live  cell. 
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Figure  5.  An  example  of  an  isolated  cell. 

3.1.  The  patching  method 

The  first  algorithm  for  integrating  a  linear  systolic  array  achieves  the  lower  bound  of  Q(\/lg  N  ) 
by  partitioning  the  wafer  into  squares,  forming  linear  arrays  within  each  square,  and  then 
patching  together  the  ends  of  the  small  linear  arrays  to  yield  a  single  linear  array  consisting 
of  all  the  live  cells  on  the  wafer. 

More  precisely,  the  method  is  as  follows.  Partition  the  wafer  into  square  regions  containing 
21g/V  cells  each,  as  is  shown  by  the  dashed  lines  in  Figure  6.  The  probability  that  each  of  the 
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Figure  6.  A  scheme  for  constructing  linear  arrays  from  all  live  cells  on  a  wafer  with  wires 
of  length  0(  v/lg  .V  )  and  constant  channel  widths. 
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!  Ip,  V  cells  are  dead  in  ;inr  or  more  of  tin  squares  is  less  than  1  /  V  by  Fact  1,  Thus,  wifi,  high 
irobah’ditv,  each  of  the  squares  contains  at  least  one  live  cell. 

Const  ruel  a  linear  array  out  of  l  lie  live  cells  in  each  square  using  a  snake-like  scheme  on  the 
■iiliimns  of  i-.ie  square,  except  that,  when  an  ‘empty  column  is  encountered,  skit-  over  it.  i y; u re  h 
Tow >  ilit.-c  ’onnections  with  solid  lines.  Sima  any  pair  of  cells  in  I  he  same  square  eat  he  link*  «i 
•it  i .  \vir->  of  length  at  most  2\/2  ip,  .V,  the  wires  in  each  array  have  brigth  0[\/ig  .V|.  N  ex:., 
sdd  wires,  s.iown  by  dotted  lines  in  the  figure,  to  connect  the  small  arrays  into  one  large  array, 
hciist  each  region  contains  at,  least,  one  live  cell,  those  connections  can  he  made  wit  h  •>  s  of 
englh  at  mist  lix/2  lg  N.  Thus,  every  wire  ir  the  completed  linear  array  has  length  i'  ) 

.\iiti  high  probability. 

3.2  j  Is e  tree  method 

.  •  !(k  veils  are  incorporated  in  a  linear  array  using  the  patching  method,  then  the  maximum 
,v: re  v  •  i i  It  is  (-)(\/lg  S)  with  high  probability.  Hut.  the  proof  of  the  lower  bound  suggests  that 
minted  .  ell;  induce  the  long  wires.  Instead  of  insisting  that  all  live  cells  be  incorporated  in  the 
incur  a-t  ay,  suppose  wo  only  require  that  *--o3£  of  the  live  cells  be  included.  This  section  describes 
i  procedure  that  can  construct  a  linear  array  from  almost  all  of  the  live  cells  with  constant-length 
/.ires. 

Ti  e  i'.roi  od ure  relies  on  the  fact  that  most  live  cells  on  the  wafer  arc  near  each  other.  More 
•pe.-dicaily.  it  has  been  proved  [17]  that  there  exists  a  positive  constant  c  such  that  for  any  d, 
vi'h  probability  1  -  0(1 /TV),  at  least.  1  -  0(2“*^)  of  the  live  cells  on  an  N-  •"!!  wafer  can  be 
•or,  < -d  i  i  a  tree  using  wires  of  length  at  most  d.  Up  to  constant  factors,  this  is  the  best 

•'!)!'•  ho1,  mi. 

rithni  consists  of  two  parts.  First,  a  tree  T  of  live  cells  is  constructed  with  wires  of 
..■  d,  ami  then  the  tree  is  transformed  into  a  linear  array  with  wires  of  length  at 

i . . i  •  lie’,  fi  he  constant  0  is  due  in  part  to  our  assumption  that  the  width  of  a  wire  equals  the 
vim  ,  Hi'  t  II.  If  wire  widths  arc  substantially  smaller,  the  constant  shrinks  closer  to  3.) 

i  ■•  ••  i  /’  can  he  constructed  by  any  of  the  algorithms  that  compute  the  minimum  spanning 
r.  <•  r  ‘  •>  r;  pit.  In  particular,  Prim’s  method  [1.  5,  21)  can  be  modified  to  compute  the  spanning 
line. 

ns  ‘  r  action  of  the  linear  array  from  the  :•  depends  on  a  result  by  Sckanina  [29]  which 

1  i die  of  a.  nontrivial  contiecc.;1  .;!|l  always  has  a  Hamiltonian  circuit.  Specifically, 

i  .  •  i  hat,  without  regard  for  w* e  Idth  ;.  the  linear  array  can  hr  constructed  using  wires 

,i  -  .1  1  .  tracing  over  wire-,  in  the  a"  T  no  more  than  twice  each.  Since  every  wire  is 

-  i.  e-  twice,  the  channel  v  nitb  .>uid  (at  worst)  double  in  the  resulting  wiring, 
.  •.  the  maximum  wire  .rtnt  -  from  3d  to  (id  when  wire  widths  are  accounted  for. 

«..<<  >'  to  he  the  root,  of  V  ■  e  let  7’; .  7’2, .  •  . ,  Tm  he  the  subtrees  of  v  as  is  shown 
.1  i.-  vm  rate  eases  not  1  i )• «  1  b;iiw  ~  are  easily  handled,  but.  we  do  not  include  the 
■  -  1  Recursively  construct  line.-  r  a.inys  on  the  nodes  of  7’j ,  72, . . . ,  7m  such  that  no 

■  n';  greater  than  .3/,,  and  so  that  'lie  end  points  of  the  array  in  7’,  are  «,  and  u,i 

'  ’i  Tlien  join  the  arrays  in  the  subtrees  hy  adding  the  following  wire's: 

■  .,i . ( vm  _  , ,  um  | ).  (These  wires  are  shown  as  dashed  lines  in  Figure  7.)  Kach  of 

a.  !,  ar.th  at  most  .3/.,  and  the  resulting  network  is  a  linear  array  on  the  nodes  of 

■>  ■  i  -omts  r  and  em.  lor  completem  ss.  we  remark  that  Urn  boundary  conditions  of  the 
o:  w  -  easily  handled. 
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Figure  7.  Constructing  a  linear  array  from  a  spanning  tree. 

4.  Integrating  two-dimensional  arrays 

The  problem  of  linking  the  live  cells  on  a  wafer  to  form  a  square  two-dimensional  systolic 
array  is  substantially  more  difficult  than  the  corresponding  problem  for  linear  arrays.  The  main 
difficulty  with  constructing  two-dimensional  arrays  is  that  constant  length  wires  no  longer  suffice 

even  if  we  throw  away  some  of  the  five  cells  [8].  In  fact,  it  has  been  shown  [17]  that  with  high 
probability,  every  realization  of  an  M-ce!l  two-dimensional  array  on  an  Af-ccll  wafer  has  a  wire 
of  length  n(v^g  M ),  for  all  M  =  Q(lg2  N).  This  result  means,  for  example,  that  wires  of  length 
f2(v/TgW)  are  required  to  connect  just  one  percent  of  the  live  cells. 

In  order  for  an  algorithm  to  be  effective  in  realizing  a  two-dimensional  array,  it  must  respect 
the  two-dimensional  constraints  inherent  in  the  problem.  For  example,  consider  the  following 
naive  algorithm  for  realizing  an  A-Z-cell  square  two-dimensional  array  from  all  the  five  cells  of  an 
IV-cell  wafer.  Wc  assume  for  convenience  that  M  rs  Af/2  is  a  perfect  square. 

Take  the  top  \fM  five  cells  on  the  wafer,  breaking  ties  randomly.  These  cells,  in  order  left 
to  right,  make  the  first  row  of  the  array.  Take  the  top  \[M  cells  of  the  remainder  as  the  second 
row,  in  order  left  to  right,  and  continue  similarly  to  make  each  row  of  the  array.  With  high 
probability  no  row  of  the  array  contains  cells  from  more  than  three  rows  of  the  wafer  because 
Fact  8  guarantees  that  every  row'  contains  nearly  \\fN  r;  live  cells. 

At  first,  this  method  does  not  seem  so  bad  because  (Fact  5)  the  horizontal  connections  among 
the  cells  of  the  array  have  length  0(lgiV).  The  vertical  connections  are  much  worse,  however. 
Consider  a  vertical  line  which  divides  the  wafer  into  left  and  right  halves.  Fact  6  says  that  we 
can  expect  that  the  number  of  cells  in  a  given  row  on  one  side  of  the  dividing  line  is  at  least 
f2(VvA/)  =  Q(Nll/4)  larger  than  the  number  on  the  other  side.  Thus,  with  constant  probability, 
the  midpoint  of  the  row  is  at  least  U[N1^4)  cells  away  from  the  dividing  line.  Two  consecutive 
rows  have  their  midpoints  on  opposite  sides  of  the  dividing  line  half  the  time,  and  thus,  with 
constant  probability,  a  wire  connecting  the  two  midpoints  has  length  n(yV1,/4).  Since  there  arc 
sfM  rows,  there  is  a  wire  of  length  nfJV1/*)  between  two  of  them  with  high  probability.  A  bound 
of  0(N  for  the  maximum  wire  length  in  the  resulting  array  can  be  shown  with  more 

detailed  analysis. 


tf 


4.1.  The  tree-of-meshes  method 


This  section  presents  an  algorithm  which  can  constuct  a  two-dimensional  array  from  all  the 
live  colls  of  an  AT-cell  wafer  if  the  channels  have  width  Q(lg  N).  All  possible  configurations  of  live 
and  dead  cells,  however  unlikely,  can  be  handled  by  this  technique,  but  the  wire  length  bounds 
arc  not  good.  This  result  will  be  used  as  a  subroutine  in  the  divide-and-conqucr  and  patching 
methods  to  achieve  better  bounds  for  wire  length  on  average-case  wafers. 

We  first  show  how  an  ALcell  wafer  with  channels  of  width  0(lg  N)  can  be  viewed  as  an  /V-lcaf 
tree  of  meshes  (2,  14,  15,  16].  The  tree  of  meshes  is  constructed  from  a  complete  binary  tree  by 
replacing  nodes  of  the  tree  with  meshes  and  single  edges  of  the  tree  with  bundles  of  edges  linking 
the  meshes.  Figure  8  shows  a  16-leaf  tree  of  meshes.  The  root  of  an  AMcaf  tree  of  meshes  is  a 
\fN-hy-y/~N  mesh.  (We  assume  for  simplicity  that  v/iV  is  a  power  of  2.)  The  nodes  at  the  second 
level  arc  \/N /2-by -y/N  meshes,  those  at  the  third  level  are  \/N /2-by-y/N/2  meshes,  and  so  on 
until  the  leaves  are  replaced  by  1-by-l  meshes. 


The  correspondence  between  the  /V-cell  wafer  and  the  N-lcaf  tree  of  meshes  is  established  as 
follows.  The  first  step  is  to  construct  a  Ig  /V- layer  three-dimensional  layout  (18,  26]  of  the  tree  of 
meshes.  Fold  the  connections  between  the  root,  of  the  tree  of  meshes  and  each  of  its  two  children 
so  that  t  he  children  lit  naturally  on  a  second  layer  over  the  root.  Fold  t  he  connections  to  each  of 
the  grandchildren  so  that  they  fit  naturally  over  the  children  on  a  third  layer,  and  so  forth.  T  his 
procedure  generates  a  lg  AAlayer  three-dimensional  layout  where  each  layer  has  area  N.  Next, 
project  the  three-dimensional  layout  onto  a  single  layer  in  the  manner  of  (31,  pp.  36-38].  Locate 
cells  of  the  wafer  at  the  leaves  of  the  tree  of  meshes.  The  crosspoints  of  the  meshes  become 
programmable  switches,  and  the  wires  of  the  meshes  become  the  wires  in  lg  Af-width  channels. 

We  now  wish  to  make  a  two-dimensional  array  from  the  M  as  N /2  live  leaves  of  the  tree  of 
meshes.  (In  general,  an  exact  square  array  is  not,  possible,  and  thus  we  shall  assume  the  array  to 
be  formed  is  missing  some  border  cells,  as  is  shown  in  Figure  9.)  We  first  use  divide-and-conquer 
to  assign  each  cell  a  number  from  1  to  M.  We  chop  the  M-cell  array  in  half  vertically  into  two 
subarrays  with  [M / 2j  and  \Mf 2]  cells.  We  recursively  assign  numbers  from  1  to  [A//2J  to  the 
first  subarray  and  numbers  from  \M /2]  to  M  to  the  second  subarray,  alternating  the  orientation 
of  the  cut  between  horizontal  and  vertical  at  each  recursive  step. 


Figure  9.  A  6-by8  array  that  is  missing  some  border  cells. 

The  assignment  is  now  simple.  The  tth  cell  of  the  array  is  mapped  to  the  tth  live  leaf  of 
the  tree  of  meshes  counting  from  left  to  right.  After  swelling  the  channel  capacities  by  a  small 
constant  factor  to  accommodate  the  wires,  adjacent  cells  can  be  connected  by  routing  wires 
through  the  unique  path  in  the  underlying  complete  binary  tree.  Routing  through  the  meshes 
can  be  done  by  treating  them  as  crosspoint  switches.  The  wire  lengths  are  0[VN  lg  N)  since  we 
need  to  route  across  0{y/N)  channels  of  width  0(lg  N). 

As  a  practical  matter,  the  tree  of  meshes  need  not  be  used  directly  for  routing  wires.  The 
assignment  algorithm  can  be  used  to  establish  the  correspondence  between  the  two-dimensional 
array  and  the  live  cells  of  the  wafer,  and  then  the  wires  can  be  routed  using  a  standard  gate-array 
routing  program.  In  the  case  when  VM  is  an  exact  power  of  2,  the  assignment  is  particularly 
simple.  The  Jfcth  live  cell  corresponds  to  the  (i,j)  position  of  the  array,  where  t  is  obtained  by 
concatenating  the  even  bits  of  the  binary  representation  of  k,  and  j  is  obtained  by  concatenating 
the  odd  bits. 

4.2.  The  divide-and-conquer  method 

The  tree-of- meshes  algorithm  works  as  well  as  might  be  expected  in  the  worst  case,  and 
thus  it  is  natural  to  wonder  how  well  it  works  on  average.  Unfortunately,  the  algorithm  works 
poorly  in  a  probabilistic  model  because  the  maximum  wire  length  is  nearly  always  large.  This 
section  presents  a  similar  divide-and-conquer  algorithm  which  works  poorly  in  the  worst  case,  but 
which  can  be  proved  to  work  extremely  well  on  average.  With  high  probability,  the  algorithm 
connects  all  the  live  cells  of  an  /V-ccll  wafer  with  channels  of  width  O(lglgyV)  using  wires  of 
length  0(lg  N  lg  lg  N). 

The  divide-and-conquer  algorithm  has  two  stages.  In  the  first  stage,  the  wafer  is  recursively 
bisected,  and  the  number  of  live  cells  in  each  half  is  counted.  Rased  on  the  count  of  live  cells  in 
each  half  of  the  wafer,  the  algorithm  computes  the  dimensions  of  the  two  subarrays  that  must  be 
constructed,  and  then  recursively  constructs  the  subarrays.  The  two  subarrays  are  then  linked 
together  to  form  the  complete  array.  The  algorithm  remains  in  the  first  stage  as  long  as  the 
distribution  of  cells  within  the  current  region  of  the  wafer  is  good,  which  (with  high  probability) 
is  until  subproblems  with  0(lg  N )  cells  are  encountered.  Helow  this  point,  the  distribution  of  cells 


‘'•in  bo  arbitrarily  bad,  and  thus  the  algorithm  uses  the  tree-of- meshes  technique  to  complete  the 
wiring  of  a  0(lg  .V)-eoll  subarray.  The  exact  crossover  point  bet  ween  the  first  and  second  stages 
can  be  set  at  subproblems  of  size  elg/V,  where  c  is  any  constant  sufficiently  large  to  ensure  that 

with  high  probability,  every  clgN-ccll  region  contains  fl(lgTV)  live  cells.  That  such  a  c  exists  is 
a  consequence  of  Fact  8. 

Figures  10  through  13  illustrate  the  dividc-and-conquer  procedure.  Figure  10a  shows  a  64-cell 
wafer  which  contains  36  live  cells.  In  what  follows,  we  step  through  the  algorithm  as  it  constructs 
a  6-by-6  array,  which  is  identified  as  the  “overall  target"  in  Figure  10b. 
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Figure  10a.  A  64-cell  wafer  that  contains  36  live  cells. 


Figure  10b.  The  target:  a  6-by-6  systolic  array. 


13 


The  first  step  is  to  bisect  the  wafer  vertically,  which  gives  19  live  cells  in  the  left  half  and  17 
ir.  the  right.  We  wish  to  construct  a  19-cell  subarray  in  the  left  half  wafer  and  a  17-ccll  subarray 
in  the  ri^ht  half  wafer.  Since  we  want  the  two  subarrays  to  fit  together  nicely  after  they  have 
been  constructed,  we  choose  the  shapes  of  the  two  subarrays  that  are  determined  by  the  partition 
of  the  6-by-S  array  shown  in  Figure  11. 


Figure  11.  Partitioning  the  target. 

We  now  invoke  the  procedure  recursively  on  the  two  subarrays,  but  this  time  we  bisect  each 
of  the  halves  horizontally.  For  example,  when  the  left  half  wafer  is  bisected,  the  19  live  cells  are 
divided  into  9  cells  above  and  10  cells  below,  as  displayed  in  Figure  12.  The  algorithm  continues 
in  this  fashion,  alternating  between  horizontal  and  vertical  divisions,  until  the  wafer  and  the 
target  have  been  partitioned  into  0(lgiV)-ccll  regions,  at  which  point  the  algorithm  proceeds  to 
the  second  stage,  and  the  tree-of-meshes  technique  is  applied. 


Figure  12.  Partitoning  the  left  target. 


In  this  example  the  number  of  cells  is  small  enough  that  the  second  stage  construction  can 
be  performed  by  inspection.  The  inspection  strategy  can  be  used  effectively  in  practice.  Since 
the  second  stage  operates  on  regions  of  size  0(lg  N),  the  routings  of  this  size  can  conceivably  be 
precomputed.  The  second  stage  then  consists  of  a  single  table  lookup. 

Figure  13  shows  the  final  solution  to  the  problem  in  Figure  10.  For  clarity  the  wires  have  not 
been  routed  within  the  channels  of  the  wafer.  Notice  that  each  quadrant  contains  the  specified 


Figure  13.  Completed  cell  assignment  and  wiring  of  the  6-by-6  array. 

targets  for  second  level  of  recursion.  The  dashed  lines  represent  wires  that  connect  cells  in 
different  quadrants  of  the  wafer. 

With  probability  1  —  0(1/ N)  the  divide  and-conquer  method  can  construct  a  two-dimensional 
array  from  all  the  live  cells  on  an  /V-ceii  wafer  using  wires  of  length  0( Ig  A  Iglg  /V)  and  channels 
of  width  O(lglgZV).  It  is  not  too  difficult  to  see  that  these  bounds  hold  with  probability  1  for 
the  regions  of  size  less  than  clg  A  that  are  connected  by  the  trec-of-meshes  procedure.  Plugging 
in  c!g  A  for  A  in  the  tree-of-meshes  bound  yields  wires  of  length  0(v/Tg"A^  lg  Ig  A)  and  channels 
of  width  0(lg  Ig  A). 

The  hard  part  is  showing  that  the  wiring  in  the  upper  levels  of  recursion  satisfy  the  bounds. 
The  analysis,  which  we  briefly  sketch,  assumes  that  during  the  recursion,  the  channel  dividing  a 
subwafer  with  m  >  clg  A  cells  has  width  0\/lg  A  Igm.  Uniform  channel  widths  of  Ig  lg  A  across 
the  entire  wafer  can  later  be  obtained  by  distributing  the  wider  channels  across  neighboring 
channels,  which  does  not  asymptotically  increase  the  wire  lengths  in  the  subsequent  analysis. 

We  begin  at  the  first  level  of  recursion.  Consider  the  wires  that  link  a  eel!  in  the  left  subarray 
to  a  cell  in  the  right  subarray,  as  is  illustrated  by  the  two  examples  in  Figure  14.  For  the  most 


Figure  14a.  A  distribution  ef  live  cells  which  might  allow  a  narrow  center  channel. 


Figure  14b.  A  distribution  of  live  cells  which  requires  a  wide  center  channel. 

part,  the  connecting  wires  can  be  routed  in  the  channel  that  separates  the  left  and  right  halves 
of  the  wafer.  The  length  of  the  longest  wire  in  the  channel  is  proportional  to  the  longest  vertical 
distance  that  a  single  wire  must  traverse,  as  is  the  width  of  the  channel  itself. 

The  length  of  the  longest  wire  in  the  center  channel  depends  on  the  distribution  of  cells  in 
each  quadrant.  For  example,  if  we  arc  extremely  lucky  and  the  live  cells  are  regularly  spaced, 
the  longest  wire  may  have  constant  length,  as  in  Figure  Hu.  Hu l  if  we  are  very  unlucky,  half  the 


live  cells  might  occur  in  the  upper  right  quadrant  and  the  other  half  in  the  lower  left  quadrant 
(Figure  14b).  To  connect  the  two  halves  in  this  latter  case,  some  wire  must  have  length  Cl{>/N). 

The  length  of  the  longest  wire  in  the  center  channel  can  also  be  influenced  by  the  distribution 
of  cells  within  a  quadrant.  For  example,  if  the  upper  left  quadrant  contains  \/N / 8  live  cells 
(about  the  right  number),  but  they  are  distributed  as  in  Figure  15,  then  the  center  channel  still 
contains  a  wire  of  length  fi(\/77). 

Most  often,  we  are  not  so  unlucky  that  a  wire  in  the  center  channel  has  length  U(y/N),  but 
neither  are  we  lucky  enough  that  all  wires  are  constant  length.  With  high  probability,  we  are 
more  lucky  than  unlucky  because  the  length  of  the  longest  wire  in  the  center  is  O(lgAf).  The 
idea  is  that  the  live  cells  are  distributed  so  evenly  that  with  high  probability,  the  total  vertical 
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Figure  15.  Another  distribution  of  live  cells  which  requires  a  wide  center  channel. 

distortion  of  the  wires  in  the  center  channel  (over  all  subproblems  of  size  fi(lgTV))  is  0(lg  N).  For 
channels  dividing  a  subwafer  of  size  m  >  clgiV,  the  vertical  distortion  is  0(\/lg  rn  lg  N).  Thus, 
the  channel  width  bounds  assumed  earlier  suffice. 

The  wire  length  analysis  of  the  divide-and-conqucr  algorithm  is  fairly  tight.  For  example, 
the  algorithm  requires  wires  of  length  fi(lg  N)  with  high  probability.  Thus,  if  the  lower  bound 
of  Q(v/lg  iV )  is  to  be  achieved,  a  different  algorithm  must  be  discovered.  It  may  be  possible  to 
improve  the  channel  width  bound,  however.  For  example,  any  improvement  in  the  worst-case 
hound  given  by  the  tree-of- meshes  technique  would  lead  directly  to  an  improvement  in  the  channel 
width  bounds  for  the  divide-and-conquer  algorithm. 


4.3.  The  patching  method 

Not  surprisingly,  we  can  improve  the  wire  length  bounds  if  we  need  only  construct  a  two- 
dimensional  array  from  most  of  the  live  cells  on  a  wafer.  In  particular,  we  can  use  a  scheme  similar 
to  the  patching  scheme  from  Section  3.1  to  construct  a  two-dimensional  array  from  any  constant 
fraction  (less  than  1)  of  the  live  cells  on  an  /V-cell  wafer  using  wires  of  length  0(y/ Tf$  N  IglgJV) 
and  channels  of  width  O(lglgiV).  These  bounds  are  also  achieved  with  high  probability. 

The  key  idea  is  to  partition  the  wafer  into  TV/clgfV  square  regions,  each  containing  m  = 
r  lg /V  cells.  According  to  Pact  8,  we  can  choose  c  sufficiently  large  such  that  with  probability 
1  -  ()( 1/Ar),  each  of  the  regions  contains  at  least  m'  =  ^clgfV  —  y/c  lg  AT  live  cells.  Using  the 
tree-of- meshes  technique,  we  can  therefore  construct  an  m'-cell  two-dimensional  array  in  each 
region  using  wires  of  length  0(\fm  Igm)  =  0(>/lg  tV  Iglg  N)  and  channels  of  width  O(lgm)  = 
O(lglg.'V).  The  /V/clg/V  two-dimensional  arrays  are  then  connected  together  into  one  large 
array  wit  a  },  W(  I  —2 /y/c)  live  cells.  The  added  wires  also  have  length  at  most  ()(\/\g  N  IglgN), 
and  can  easily  fit  into  the  0(lg  lg  N)-  vidth  channels. 

The  patching  method  can  be  thought  or  as  a  refinement  of  the  divide-and-conquer  method 
that  throws  away  a  fraction  of  the  cells  at  each  level  of  the  recursion.  The  actual  decisions  as  to 
which  cells  at  a  given  level  are  thrown  away  can  be  postponed  until  lower  in  the  recursion,  but 
it  i-  important  that  at  each  level,  every  region  of  the  wafer  have  exactly  the  same  number  of  live 


4.4.  Greene’s  method 

The  text  method,  due  to  Greene  (7 j ,  also  connects  any  constant,  fraction  of  the  live  ceils  on 
an  .V-nl|  wafer  into  a  two-dimensional  array.  With  high  probability,  it  uses  wires  of  length 


Q(\/[gN)  and  channels  of  constant  width,  thus  achieving  the  lower  bound  for  integration  of  two- 
dimensional  arrays.  It  is  similar  to  the  algorithm  presented  at  the  beginning  of  this  section  in  that 
it  creates  rows  of  the  array,  but  it  is  considerably  more  clever.  The  algorithm  that  determines 
the  rows  and  columns  of  the  array  is  based  on  network  flow  techniques,  but  we  present  it  in  a 
manner  that  does  not  require  a  knowledge  of  combinatorial  optimization. 

Greene’s  algorithm  can  construct  a  (1  —  c)v/jV-by-|(l  —  i )\/N  array,  for  any  constant  t  >  0. 
For  any  such  t ,  we  require  the  TV-cell  wafer  to  have  channels  of  width  w,  where  w  is  a  sufficiently 
large  constant  that  depends  on  <.  The  higher  the  percentage  of  cells  we  wish  to  integrate  into  an 
array,  the  wider  we  must  make  the  channels. 

Partition  the  wafer  as  shown  in  Figure  16  into  blocks  of  size  l-by-ci\/lg  N  such  that  there 
are  \/N  rows  of  blocks  and  \/N  jc\ >/lg  N  columns  of  blocks,  where  C\  is  a  constant  depending 


Figure  16.  Forming  the  tentative  rows  in  Greene’s  method.  Blocks  containing  fewer  than  t 
live  cells  are  marked  with  solid  X’s.  Blocks  marked  as  bad  during  the  scan  are  marked  with 
dashed  X’s. 


on  £.  Mark  a  block  as  bad  if  it  contains  fewer  than  t  live  cells,  and  good  ot  herwise,  where  t  is  also 
a  constant  depending  on  t.  For  the  exact  values  of  constants,  we  refer  the  reader  to  (7). 

The  first  part  of  the  algorithm  determines  tentative  rows  for  the  array.  We  divide  the  w 
vertical  tracks  between  blocks  on  the  wafer  into  two  bundles,  each  consisting  of  w/2  tracks.  For 
this  part  of  the  algorithm,  we  will  treat  the  two  bundles  as  two  routing  tracks.  Later,  we  will 
need  to  reexpand  the  capacity  of  the  two  tracks  by  w/2  each. 

The  algorithm  first  determines  (1  -  i)V~N  horizontally  running  chains  from  the  left  edge  of 
the  wafer  to  the  right  edge  through  the  good  blocks.  The  chains  must  satisfy  the  constraint  that 
no  wire  is  longer  than  C2  v^S  M  r  for  some  constant  depending  on  c.  The  algorithm  determines 
the  chains  in  the  following  manner.  Scan  the  columns  of  blocks  left  to  right.  For  each  column , 
proceed  through  the  blocks  from  top  to  bottom.  At  each  point,  if  the  current  block  is  good,  we 
attempt  to  connect  it  to  a  good  block  on  the  left.  This  connection  is  made  to  the  uppermost 
good  block  within  distance  C2\/lg  N  <  UP  or  down,  from  the  current  block  that  has  not  yet  been 
connected  to  a  block  in  the  current  column.  It  must  also  satisfy  the  constraint  that  the  routing 
does  not  exceed  the  channel  capacity  of  2.  If  such  a  connection  cannot  be  made,  we  mark  the 
current  block  as  bad.  Block  (5,2)  in  Figure  16  is  marked  bad  for  this  reason.  Some  chains  are 
terminated  by  this  procedure — for  example,  the  chain  ending  in  block  (3,2)  of  the  figure.  With 
high  probability,  however,  this  procedure  establishes  (1  —  c)\/fV  horizontal  chains,  each  with 
v/W/civ/lglV  blocks. 

The  horizontally  running  chains  can  be  viewed  conceptually  as  shown  in  Figure  17.  We  now 
expand  the  blocks  in  the  chains  to  see  their  internal  structure,  as  shown  in  Figure  18.  The 
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Figure  17.  Normalized  view  of  the  rows  of  blocks. 
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Figure  18  Forming  the  columns  in  Greene’s  method.  Dead  cells  are  marked  with  solid  X’s. 

Cells  marked  as  bad  during  the  scan  are  marked  with  dashed  X’s. 

horizontal  tracks  in  Figure  18  actually  correspond  to  sections  of  both  horizontal  and  vertical 
tracks  in  figure  16  because  the  chains  run  both  horizontally  and  vertically.  The  horizontal 
channels  in  Figure  18  have  wj 2  tracks,  and  thus  the  two  vertical  tracks  between  blocks  in  Figure 
16  must  each  be  expanded  by  w/2  to  accommodate  the  wires  we  shall  now  route  to  make  the 
vertical  connections. 

Wo  establish  the  vertically  running  chains  by  essentially  the  same  procedure  as  before,  ex¬ 
cept  we  scan  top  to  bottom  and  route  through  horizontal  channels  of  width  w/2.  With  high 
probability,  the  algorithm  constructs  i(|  -<)y/N  vertical  chains.  The  horizontal  chains  are  now 
modified  to  include  only  those  cells  used  in  the  vertical  chains,  which  completes  construction  or 
the  (1  —  < )\//V-l>v-  ^(1  —  r)\/N  array.  All  channels  are  constant  width  w,  and  it  turns  out  to  be 
the  case  that  all  wire  lengths  are  ). 

Greene’s  method  generates  a  rectangular  array  with  aspect  (length  to  width)  ratio  2,  but 
we  may  wish  to  realize  a  square  array  without  throwing  away  half  the  cells.  By  embedding  a 
(1  —  <)y/iV/2-by-(l  —  /2  square  array  into  a  (1  —  c)\/VV-by-3(l  —  t)\fN  rectangular  array 

so  that  adjacent  cells  of  the  square  array  are  constant  distance  away  in  the  rectangular  array, 
we  can  use  Greene’s  method  directly.  The  first  row  of  the  square  is  embedded  in  the  first  two 
rows  of  the  rectangle  such  that  all  the  first  row  of  the  rectangle  is  used  and  an  evenly  spaced 
portion  of  the  second  row  is  used.  We  connect  the  cells  of  the  first  row  of  the  square  linearly 
left  to  right  in  the  rectangle.  The  second  row  of  the  square  is  embedded  linearly  in  the  second 
and  third  rows  of  the  rectangle  using  all  the  remaining  cells  in  the  second  row  and  a  uniformly 
spaced  portion  of  cells  in  the  third  row.  The  third  row  of  .the  square  uses  all  the  remaining  cells 
in  the  third  row  of  the  rectangle,  all  the  cells  in  the  fourth  row,  and  a  uniformly  spaced  portion 
of  cells  from  the  fifth  row.  We  continue  in  this  fashion  until  the  embedding  is  completed.  Every 
adjacent  pair  of  cells  in  the  square  array  arc  within  horizontal  and  vertical  distances  of  four  cells 
in  the  rectangular  array.  This  procedure  can  he  generalized  to  construct  any  rectangular  array 
of  any  aspect  ratio. 

4.5.  The  matching  method 

We  conclude  with  a  method  whose  proven  bounds  are  not  as  good  as  those  presented  thus 
far,  but  which  is  nevertheless  interesting.  In  the  case  of  widthlcss  wires,  this  method,  which  is 


based  on  bipartite  matching  in  a  graph,  can  integrate  all  the  cells  on  an  N- cell  wafer  with  wires 
of  length  0{ Ig'^4  A').  When  we  consider  the.  normal  case  of  unit  width  wires,  however,  we  could 
conceivably  need  channels  of  width  0(lg3/4  AT),  and  because  the  wires  would  need  to  cross  these 
channels,  wires  of  length  0(lg3^2  N).  This  algorithm  is  certainly  worth  considering  when  wire 
widths  are  small  because  the  0(lg3^-1  N)  wire-length  bound  is  better  than  the  bound  of  0(lgN) 
which  the  dividc-and-conquer  method  yields  for  widthless  wires.  Moreover,  the  true  performance 
of  the  matching  method  might  be  better  than  that  suggested  by  the  upper  bound  for  unit-width 
wires.  In  comparison,  the  divide-and-conquer  method  has  a  hard  lower  bound  of  ©(IgA')  even 
fur  widthless  wires.  In  addition,  the  algorithm  is  easily  tailored  to  handle  the  situation  when  we 
wish  to  integrate  any  constant  fraction  of  the  live  cells,  in  which  case  the  widthle  s  wire  bound 
shrinks  to  0(\/lg  Ar ),  which  is  optimal. 

The  first  step  of  the  matching  method  is  to  determine  the  number  M  of  live  cells  on  an 
.V-cell  wafer.  Then  we  pick  a  target  wire  length  d  that  we  hope  to  achieve.  The  algorithm  now 
determines  the  locations  of  points  in  a  uniform  \f M-by-\/~M  grid  superimposed  on  the  wafer.  It 
I  lien  constructs  a  bipartite  graph  between  the  grid  points  and  the  live  cells  of  the  wafer  with  an 
edge  between  a  grid  point  and  a  live  cell  if  the  distance  between  them  is  at  most  d.  Then,  using 
a  bipartite  matching  algorithm  [5],  the  procedure  determines  whether  every  grid  point  can  be 
matched  one-to-one  with  a  live  cell.  If  a  perfect  matching  exists,  then  we  know  a  routing  of  the 
corresponding  assignment  with  widthless  wires  has  maximum  edge  length  d. 

It  is  possible  to  show  {19.  30]  that  if  d  —  0(lg3/'<  ,\),  then  the  matching  succeed*-  with  high 
probability.  As  a  practical  matter,  it  is  better  to  search  for  the  smallest,  d  that  works  for  a  given 
wafer  using  exponential  search.  Try  d  =  1,2,  1,8,...  until  a  value  of  <1  is  found  that  results  in  a 
perfect  matching,  ami  then  binary  search  to  lind  the  exact  value. 

The  same  technique  can  bo  applied  to  construct  a  two-dimensional  array  from  any  number 
m  <  M  of  the  M  live  cells  by  using  a  y/m-by-y/m  grid.  For  the  case  when  m  =  (1  -  t)M,  it 
can  be  shown  that  wires  have  length  0[y/ Ig  A' )  with  high  probability. 

5.  Summary  and  conclusions 

The  content  of  this  paper  is  taken  primarily  from  [17]  and  somewhat  horn  [7]  and  [8].  The 
algorithms  presented  are  summarized  in  Tables  I  and  II.  The  literature  contains  many  more 
techniques  for  integrating  systolic  arrays.  Manning  [22,  23],  Ilodlund  and  Snyder  [9],  Koren 
[I!!,  and  Kussell  and  Varman  [6]  look  at  the  basic  problem  of  constructing  arrays  from  wafers 
containing  faulty  cells.  Rosenberg  [27,  28],  v  hung,  Leighton,  and  Rosenberg  [3,  *i],  and  Bhatt 
and  Leighton  [2]  have  also  investigated  fault  tolerance. 


Table  I 

Bounds  for  One-Dimensional  Arrays 


Method 

Portion  of 

Maximum 

Maximum 

cells  used 

wire  length 

channel  width 

patching 

all 

6(i/log  N) 

6(1) 

optimal 

all 

8(v/loglV) 

6(1) 

tree 

99% 

6(1) 

6(1) 

Table  Ha 

Bounds  for  Two-Dimensional  Arrays 
(worst-case  wafer,  using  all  live  cells) 

Maximum 

Method 

wire  length 

Maximum 

Maximum  wire 

for  widthless 

channel  width 

length  for  unit 

wires 

width  wires 

tree  of  meshes 

6  (v^A) 

O(logJV) 

0(-/AMog  AO 

optimal 

6(v/W) 

£1(1) 

n(v^A) 

Table  lib 

Bounds  for  Two-Dimensional  Arrays 
(average-case  wafer,  using  all  live  cells) 

Method 

Maximum  wire 

length  for  Maximum 

widthless  wires  channel  width 

Maximum  wire 
length  for  unit 
width  wires 

divide  &.  conquer  e(logW) 

matching  0(logs/*  N) 

optimal  fl(vdog  AO 

0(log  log  N) 
0(log’/4lV) 

0(1) 

O(log  N  log  log  A) 
0(logs/3  AT) 
fl(v/log  AO 

Table  lie 

Bounds  for  Two-Dimensional  Arrays 
(average-case  wafer,  using  99%  of  the  live  cells) 

Maximum  wire 

Maximum  wire 

Method 

length  for 

Maximum 

length  for  unit 

widthless  wires 

channel  width 

width  wires 

patching 

8(v/log7f) 

O(log  log  N ) 

0(v1og7f  log  log  AO 

Greene 

8(y1og?0 

6(1) 

6(\/Iog  AO 

matching 

6(^/1°$  AO 

0(vdog  N) 

O(logA) 

optimal 

0(\Tdg  /V) 

6(1) 

H( ydog  N) 
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